Statistical-Based Abbreviation Expansion

نویسندگان

  • Jan Zelinka
  • Jan Romportl
  • Ludek Müller
چکیده

The work presented in this paper deals with the text normalization for highly inflectional languages. This paper is focused on abbreviation expansion and likewise on numerals normalization. Our text normalization system does not use any explicit parser or part-of-speech tagger and thus it can be called lightly supervised. The standard rule-based text normalization method is compared with the proposed statistical-based one in the task of expansion of Czech abbreviations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A System for Automatic Abbreviation Expansion

A system for automatic abbreviation expansion was developed and tested for use with an AAC device. The system blends several technologies in a process that automatically expands user generated abbreviations while additionally providing spell-checking. Using a series of heuristic rules and a statistical language model, the system combines a series of rule scores and probabilities to rank valid w...

متن کامل

An easily implemented method for abbreviation expansion for the medical domain in Japanese text. A preliminary study.

BACKGROUND One of the barriers for the effective use of computerized health-care related text is the ambiguity of abbreviations. To date, the task of disambiguating abbreviations has been treated as a classification task based on surrounding words. Application of this framework for languages that have no word boundaries requires pre-processing to segment a sentence into separate word sequences....

متن کامل

Automatic expansion of abbreviations by using context and character information

Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain-specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation ex...

متن کامل

Vocabulary expansion through automatic abbreviation generation for Chinese voice search

Long named entities are often abbreviated in oral Chinese language, and this usually leads to out-of-vocabulary(OOV) problems in speech recognition applications. The generation of Chinese abbreviations is much more complex than English abbreviations, most of which are acronyms and truncations. In this paper, we propose a new method for automatically generating abbreviations for Chinese named en...

متن کامل

RePaLi Participation to CLEF eHealth IR Challenge 2014: Leveraging Term Variation

This paper describes the participation of RePaLi, a team composed with members of IRISA, LIMSI and STL, to the biomedical information retrieval challenge proposed in the framework of CLEF eHealth. For this first participation, our approach relies on a state-of-theart IR system called Indri, based on statistical language modeling, and on semantic resources. The purpose of semantic resources and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011